DISCO: Describing Images Using Scene Contexts and Objects
نویسندگان
چکیده
In this paper, we propose a bottom-up approach to generating short descriptive sentences from images, to enhance scene understanding. We demonstrate automatic methods for mapping the visual content in an image to natural spoken or written language. We also introduce a human-in-the-loop evaluation strategy that quantitatively captures the meaningfulness of the generated sentences. We recorded a correctness rate of 60.34% when human users were asked to judge the meaningfulness of the sentences generated from relatively challenging images. Also, our automatic methods compared well with the state-of-the-art techniques for the related computer vision tasks.
منابع مشابه
Color scene transform between images using Rosenfeld-Kak histogram matching method
In digital color imaging, it is of interest to transform the color scene of an image to the other. Some attempts have been done in this case using, for example, lαβ color space, principal component analysis and recently histogram rescaling method. In this research, a novel method is proposed based on the Resenfeld and Kak histogram matching algorithm. It is suggested that to transform the color...
متن کاملHeightened Responses of the Parahippocampal and Retrosplenial Cortices during Contextualized Recognition of Congruent Objects
Context sometimes helps make objects more recognizable. Previous studies using functional magnetic resonance imaging (fMRI) have examined regional neural activity when objects have strong or weak associations with their contexts. Such studies have demonstrated that activity in the parahippocampal cortex (PHC) generally corresponds with strong associations between objects and their spatial conte...
متن کاملA DisCo: Displays that Communicate
We present DisCo, a novel display-camera communication system. DisCo enables displays and cameras to communicate with each other, while also displaying and capturing images for human consumption. Messages are transmitted by temporally modulating the display brightness at high frequencies so that they are imperceptible to humans. Messages are received by a rolling shutter camera which converts t...
متن کاملGenerating Image Captions using Topic Focused Multi-document Summarization
In the near future digital cameras will come standardly equipped with GPS and compass and will automatically add global position and direction information to the metadata of every picture taken. Can we use this information, together with information from geographical information systems and the Web more generally, to caption images automatically? This challenge is being pursued in the TRIPOD pr...
متن کاملAn Evaluation on Color Invariant Based Local Spatiotemporal Features for Action Recognition
Despite recent advances in the design of features to improve automated human action recognition, color information has so far been overlooked. Nevertheless, color has been proven an important element to the success of automated recognition of objects/scenes and segmentation. For object and scene recognition in static images, robustness to photometric variations has been achieved by describing l...
متن کامل